add kitsune.l10n app for handling content localization #6330

escattone · 2024-11-01T23:59:23Z

mozilla/sumo#2053

Notes

This PR introduces a new kitsune.l10n application into SUMO that, for now, automatically creates and manages machine translations of KB articles. It uses an LLM to create the machine translation and, if a prior approved translation exists, the machine translation is heavily influenced by that prior translation. This respect for prior contributor translations was the requirement that drove this approach.

This new kitsune.l10n app is designed to be independent of the apps containing content that it localizes -- so far just the kitsune.wiki app. So, in other words, the kitsune.wiki app doesn't know anything about -- or, in other words, doesn't import anything from -- the kitsune.l10n app. If the kitsune.l10n app was removed, the kitsune.wiki app would continue functioning as usual, just without automatically-generated machine translations.

In general, the system is relatively simple and consists of two main components:

A real-time component that initiates a Celery task that creates and manages machine translations as needed for the specific KB article that's been updated with a new approved revision.
A periodic or "heartbeat" component -- whose interval is configurable -- that continuously initiates the same Celery task (in this case, with no argument) that creates and manages machine translations for all KB articles as needed. This "heartbeat" component typically just manages existing translations -- for example, rejecting machine translations that are outdated or approving machine translations that haven't been reviewed within the grace period -- but also acts as a backup to the real-time component should it fail for any reason.

Both of the main components above call the same Celery task handle_wiki_localization(), which in turn, uses the following two core functions:

create_machine_translations()
manage_existing_machine_translations()

Once the handle_wiki_localization() Celery task has started (in any Celery worker), it can not be run again (in any Celery worker) until it has finished. This is managed via a Postgres advisory lock, which must be acquired in order to start, and is released only upon normal completion or an exception. This is to prevent the possibility (although small) of creating duplicate machine translations, which could occur if two instances of the task run simultaneously.

All of the settings for machine translations can be made via the Django admin, and any changes take immediate effect. Currently, machine translations can be restricted by locale and/or by KB article slug and/or by the group of the KB article approver and/or by the approval date/time.

By default, this l10n application is disabled.

Local Testing

@akatsoulas @smithellis @emilghittasv -- All of you are already configured to impersonate the GKE dev service account, which provides access to the Vertex AI API.

Impersonate the GKE dev service account locally
- gcloud auth application-default login --impersonate-service-account <gke-dev-sa-email> -- I can send you the email to use via Slack
- Set your location to the root of the kitsune repo -- cd ~/repos/kitsune
- Move the impersonated creds into the root -- cp -pr ~/.config/gcloud ./gcloud
Add the following to your .env file:
- GOOGLE_APPLICATION_CREDENTIALS=./gcloud/application_default_credentials.json
- GOOGLE_CLOUD_PROJECT=moz-fx-sumo-nonprod
Bring up your docker environment, run the DB migrations, etc.
Go into the admin and configure machine translations. You'll need to do the following:
- Enable machine translation
- Specify an LLM model name -- use gemini-1.5-pro-002
- Select one or more locales
- Adjust, keep the default, or clear the "approved after" date/time limitation
- Add any other limitations as desired

Future Adjustments

Reporting -- Create new L10n reporting views/pages
Error handling -- Currently, if the act of creating any single machine translation (create_machine_translation()) raises an exception, any pending machine translations would be abandoned for the current run of the handle_wiki_localization() Celery task. Of course, that Celery task will be run again at the next heartbeat, so we'll try again later, but it's possible we may want to handle some exceptions in the future. The challenge is that it's difficult to tell from the source code what exceptions might be raised, so I think we can wait to see what Sentry events we get, if any, and decide then whether it makes sense to handle those. I should add that the invoke() method of LangChain's chat model already retries on common, recoverable API exceptions (it's currently configured to retry twice before giving up), so it may be that we don't need to add any LLM API exception handling at all, and we just allow exceptions to be raised and then reported by Sentry (which is the current approach).

Infrastructure Configuration Needed

Enable "workload identity" for the Vertex AI API by granting the aiplatform.user role to the stage and prod GKE service accounts via Terraform
Add the OpenAI API key and organization ID environment variables to the stage and prod secrets

TODO

Add ability to limit machine translation to revisions approved by users within a specific group.
Add tests to test_wiki.py that cover the slug, date, and group filtering.
Add whatever we need to cover our reporting needs.
Add the ability to exclude revisions created by the SUMO L10n Bot on the recent revisions page.
Record LLM service calls and make available in the Django admin
Record wiki revision activity and make available in the Django admin

escattone force-pushed the llm-l10n-poc branch 17 times, most recently from a288ac2 to f878ad8 Compare November 12, 2024 00:15

escattone force-pushed the llm-l10n-poc branch 7 times, most recently from fbb70d0 to 4250184 Compare November 13, 2024 00:02

escattone marked this pull request as ready for review November 13, 2024 00:23

escattone requested a review from akatsoulas November 13, 2024 00:23

escattone force-pushed the llm-l10n-poc branch 4 times, most recently from f92f439 to 86e2edb Compare November 15, 2024 00:35

escattone force-pushed the llm-l10n-poc branch 3 times, most recently from 8fa08a0 to 08e764f Compare January 28, 2025 22:21

escattone force-pushed the llm-l10n-poc branch 2 times, most recently from a074182 to 2dfcae2 Compare February 3, 2025 21:27

escattone temporarily deployed to dev February 3, 2025 21:36 — with GitHub Actions Inactive

escattone force-pushed the llm-l10n-poc branch 2 times, most recently from 0f98201 to 1c18bb0 Compare February 10, 2025 17:46

escattone temporarily deployed to dev February 10, 2025 17:55 — with GitHub Actions Inactive

escattone force-pushed the llm-l10n-poc branch 5 times, most recently from 35534fc to 822448c Compare March 6, 2025 22:53

escattone temporarily deployed to dev March 7, 2025 00:07 — with GitHub Actions Inactive

escattone force-pushed the llm-l10n-poc branch from 822448c to d4c9c41 Compare March 13, 2025 18:56

escattone force-pushed the llm-l10n-poc branch 5 times, most recently from 19220df to 34e2486 Compare April 15, 2025 15:00

escattone force-pushed the llm-l10n-poc branch from 34e2486 to 1061292 Compare April 21, 2025 20:55

escattone temporarily deployed to dev April 22, 2025 16:55 — with GitHub Actions Inactive

escattone force-pushed the llm-l10n-poc branch 3 times, most recently from bff98ae to 26ce8a7 Compare May 6, 2025 18:14

escattone added 2 commits May 27, 2025 16:08

add kitsune.l10n app for handling content localization

0b778ea

add recent revs filtering checkbox

abeffab

escattone force-pushed the llm-l10n-poc branch from 26ce8a7 to abeffab Compare May 27, 2025 23:09

escattone mentioned this pull request May 27, 2025

transfer core functionality from mozilla/kitsune#6330 into llm app mozilla/sumo#2350

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add kitsune.l10n app for handling content localization #6330

add kitsune.l10n app for handling content localization #6330

Uh oh!

escattone commented Nov 1, 2024 •

edited

Loading

Uh oh!

Uh oh!

add kitsune.l10n app for handling content localization #6330

Are you sure you want to change the base?

add kitsune.l10n app for handling content localization #6330

Uh oh!

Conversation

escattone commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Local Testing

Future Adjustments

Infrastructure Configuration Needed

TODO

Uh oh!

Uh oh!

escattone commented Nov 1, 2024 •

edited

Loading